Overview

Dataset statistics

Number of variables40
Number of observations59400
Missing cells46094
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.1 MiB
Average record size in memory320.0 B

Variable types

CAT28
NUM10
BOOL2

Reproduction

Analysis started2020-07-08 23:06:53.942813
Analysis finished2020-07-08 23:07:22.418313
Duration28.48 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

recorded_by has constant value "GeoData Consultants Ltd" Constant
date_recorded has a high cardinality: 356 distinct values High cardinality
funder has a high cardinality: 1897 distinct values High cardinality
installer has a high cardinality: 2145 distinct values High cardinality
wpt_name has a high cardinality: 37400 distinct values High cardinality
subvillage has a high cardinality: 19287 distinct values High cardinality
lga has a high cardinality: 125 distinct values High cardinality
ward has a high cardinality: 2092 distinct values High cardinality
scheme_name has a high cardinality: 2696 distinct values High cardinality
extraction_type_group is highly correlated with extraction_type and 1 other fieldsHigh correlation
extraction_type is highly correlated with extraction_type_group and 1 other fieldsHigh correlation
extraction_type_class is highly correlated with extraction_type and 1 other fieldsHigh correlation
management_group is highly correlated with managementHigh correlation
management is highly correlated with management_groupHigh correlation
payment_type is highly correlated with paymentHigh correlation
payment is highly correlated with payment_typeHigh correlation
quality_group is highly correlated with water_qualityHigh correlation
water_quality is highly correlated with quality_groupHigh correlation
quantity_group is highly correlated with quantityHigh correlation
quantity is highly correlated with quantity_groupHigh correlation
source_type is highly correlated with source and 1 other fieldsHigh correlation
source is highly correlated with source_type and 1 other fieldsHigh correlation
source_class is highly correlated with source and 1 other fieldsHigh correlation
waterpoint_type_group is highly correlated with waterpoint_typeHigh correlation
waterpoint_type is highly correlated with waterpoint_type_groupHigh correlation
funder has 3635 (6.1%) missing values Missing
installer has 3655 (6.2%) missing values Missing
public_meeting has 3334 (5.6%) missing values Missing
scheme_management has 3877 (6.5%) missing values Missing
scheme_name has 28166 (47.4%) missing values Missing
permit has 3056 (5.1%) missing values Missing
amount_tsh is highly skewed (γ1 = 57.80779995) Skewed
num_private is highly skewed (γ1 = 91.93374999) Skewed
id has unique values Unique
amount_tsh has 41639 (70.1%) zeros Zeros
gps_height has 20438 (34.4%) zeros Zeros
longitude has 1812 (3.1%) zeros Zeros
num_private has 58643 (98.7%) zeros Zeros
population has 21381 (36.0%) zeros Zeros
construction_year has 20709 (34.9%) zeros Zeros

Variables

id
Real number (ℝ≥0)

UNIQUE

Distinct count59400
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37115.131767676765
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile3730.9
Q118519.75
median37061.5
Q355656.5
95-th percentile70564.05
Maximum74247
Range74247
Interquartile range (IQR)37136.75

Descriptive statistics

Standard deviation21453.12837
Coefficient of variation (CV)0.5780156866
Kurtosis-1.201515029
Mean37115.13177
Median Absolute Deviation (MAD)18568.5
Skewness0.00262253035
Sum2204638827
Variance460236716.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
723101< 0.1%
 
498051< 0.1%
 
518521< 0.1%
 
620911< 0.1%
 
641381< 0.1%
 
579931< 0.1%
 
600401< 0.1%
 
334131< 0.1%
 
354601< 0.1%
 
Other values (59390)59390> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
ValueCountFrequency (%) 
742471< 0.1%
 
742461< 0.1%
 
742431< 0.1%
 
742421< 0.1%
 
742401< 0.1%
 

amount_tsh
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count98
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean317.6503846801347
Minimum0.0
Maximum350000.0
Zeros41639
Zeros (%)70.1%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation2997.574558
Coefficient of variation (CV)9.436709989
Kurtosis4903.543102
Mean317.6503847
Median Absolute Deviation (MAD)0
Skewness57.80779995
Sum18868432.85
Variance8985453.232
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
04163970.1%
 
50031025.2%
 
5024724.2%
 
100014882.5%
 
2014632.5%
 
20012202.1%
 
1008161.4%
 
108061.4%
 
307431.3%
 
20007041.2%
 
Other values (88)49478.3%
 
ValueCountFrequency (%) 
04163970.1%
 
0.23< 0.1%
 
0.251< 0.1%
 
13< 0.1%
 
213< 0.1%
 
ValueCountFrequency (%) 
3500001< 0.1%
 
2500001< 0.1%
 
2000001< 0.1%
 
1700001< 0.1%
 
1380001< 0.1%
 

date_recorded
Categorical

HIGH CARDINALITY

Distinct count356
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
2011-03-15
 
572
2011-03-17
 
558
2013-02-03
 
546
2011-03-14
 
520
2011-03-16
 
513
Other values (351)
56691
ValueCountFrequency (%) 
2011-03-155721.0%
 
2011-03-175580.9%
 
2013-02-035460.9%
 
2011-03-145200.9%
 
2011-03-165130.9%
 
2011-03-184970.8%
 
2011-03-194660.8%
 
2013-02-044640.8%
 
2013-01-294590.8%
 
2011-03-044580.8%
 
Other values (346)5434791.5%
 

Length

Max length10
Median length10
Mean length10
Min length10

funder
Categorical

HIGH CARDINALITY
MISSING

Distinct count1897
Unique (%)3.4%
Missing3635
Missing (%)6.1%
Memory size464.1 KiB
Government Of Tanzania
9084
Danida
 
3114
Hesawa
 
2202
Rwssp
 
1374
World Bank
 
1349
Other values (1892)
38642
ValueCountFrequency (%) 
Government Of Tanzania908415.3%
 
Danida31145.2%
 
Hesawa22023.7%
 
Rwssp13742.3%
 
World Bank13492.3%
 
Kkkt12872.2%
 
World Vision12462.1%
 
Unicef10571.8%
 
Tasaf8771.5%
 
District Council8431.4%
 
Other values (1887)3333256.1%
 
(Missing)36356.1%
 

Length

Max length30
Median length6
Mean length9.505824916
Min length1

gps_height
Real number (ℝ)

ZEROS

Distinct count2428
Unique (%)4.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.297239057239
Minimum-90
Maximum2770
Zeros20438
Zeros (%)34.4%
Memory size464.1 KiB

Quantile statistics

Minimum-90
5-th percentile0
Q10
median369
Q31319.25
95-th percentile1797
Maximum2770
Range2860
Interquartile range (IQR)1319.25

Descriptive statistics

Standard deviation693.1163503
Coefficient of variation (CV)1.037137833
Kurtosis-1.292440135
Mean668.2972391
Median Absolute Deviation (MAD)369
Skewness0.462402085
Sum39696856
Variance480410.2751
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02043834.4%
 
-15600.1%
 
-16550.1%
 
-13550.1%
 
-20520.1%
 
1290520.1%
 
-14510.1%
 
303510.1%
 
-18490.1%
 
-19470.1%
 
Other values (2418)3849064.8%
 
ValueCountFrequency (%) 
-901< 0.1%
 
-632< 0.1%
 
-591< 0.1%
 
-571< 0.1%
 
-551< 0.1%
 
ValueCountFrequency (%) 
27701< 0.1%
 
26281< 0.1%
 
26271< 0.1%
 
26262< 0.1%
 
26231< 0.1%
 

installer
Categorical

HIGH CARDINALITY
MISSING

Distinct count2145
Unique (%)3.8%
Missing3655
Missing (%)6.2%
Memory size464.1 KiB
DWE
17402
Government
 
1825
RWE
 
1206
Commu
 
1060
DANIDA
 
1050
Other values (2140)
33202
ValueCountFrequency (%) 
DWE1740229.3%
 
Government18253.1%
 
RWE12062.0%
 
Commu10601.8%
 
DANIDA10501.8%
 
KKKT8981.5%
 
Hesawa8401.4%
 
07771.3%
 
TCRS7071.2%
 
Central government6221.0%
 
Other values (2135)2935849.4%
 
(Missing)36556.2%
 

Length

Max length30
Median length4
Mean length5.91976431
Min length1

longitude
Real number (ℝ≥0)

ZEROS

Distinct count57516
Unique (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.077426692028794
Minimum0.0
Maximum40.34519307
Zeros1812
Zeros (%)3.1%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile30.04066001
Q133.09034738
median34.90874343
Q337.17838657
95-th percentile39.13323954
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.08803919

Descriptive statistics

Standard deviation6.567431846
Coefficient of variation (CV)0.1927208854
Kurtosis19.18703105
Mean34.07742669
Median Absolute Deviation (MAD)2.032511095
Skewness-4.191046455
Sum2024199.146
Variance43.13116105
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
018123.1%
 
37.540900642< 0.1%
 
33.010509772< 0.1%
 
39.093483892< 0.1%
 
32.97271872< 0.1%
 
33.006275482< 0.1%
 
39.103950182< 0.1%
 
37.542784972< 0.1%
 
36.802489882< 0.1%
 
39.098373982< 0.1%
 
Other values (57506)5757096.9%
 
ValueCountFrequency (%) 
018123.1%
 
29.60712191< 0.1%
 
29.607201091< 0.1%
 
29.610320561< 0.1%
 
29.610964821< 0.1%
 
ValueCountFrequency (%) 
40.345193071< 0.1%
 
40.344300891< 0.1%
 
40.325239961< 0.1%
 
40.325226431< 0.1%
 
40.323401811< 0.1%
 

latitude
Real number (ℝ)

Distinct count57517
Unique (%)96.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.706032659626431
Minimum-11.64944018
Maximum-2e-08
Zeros0
Zeros (%)0.0%
Memory size464.1 KiB

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58554992
Q1-8.540621305
median-5.02159665
Q3-3.32615564
95-th percentile-1.408872227
Maximum-2e-08
Range11.64944016
Interquartile range (IQR)5.214465665

Descriptive statistics

Standard deviation2.946019081
Coefficient of variation (CV)-0.5162990219
Kurtosis-1.057616666
Mean-5.70603266
Median Absolute Deviation (MAD)2.07002988
Skewness-0.1520365709
Sum-338938.34
Variance8.679028427
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-2e-0818123.1%
 
-6.985841732< 0.1%
 
-3.797578612< 0.1%
 
-6.981884192< 0.1%
 
-7.104625032< 0.1%
 
-7.056922532< 0.1%
 
-7.175174432< 0.1%
 
-6.990730942< 0.1%
 
-6.97875552< 0.1%
 
-6.994704012< 0.1%
 
Other values (57507)5757096.9%
 
ValueCountFrequency (%) 
-11.649440181< 0.1%
 
-11.648377591< 0.1%
 
-11.586296561< 0.1%
 
-11.568576791< 0.1%
 
-11.566804571< 0.1%
 
ValueCountFrequency (%) 
-2e-0818123.1%
 
-0.998464351< 0.1%
 
-0.9989161< 0.1%
 
-0.999012091< 0.1%
 
-0.999117021< 0.1%
 

wpt_name
Categorical

HIGH CARDINALITY

Distinct count37400
Unique (%)63.0%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
none
 
3563
Shuleni
 
1748
Zahanati
 
830
Msikitini
 
535
Kanisani
 
323
Other values (37395)
52401
ValueCountFrequency (%) 
none35636.0%
 
Shuleni17482.9%
 
Zahanati8301.4%
 
Msikitini5350.9%
 
Kanisani3230.5%
 
Bombani2710.5%
 
Sokoni2600.4%
 
Ofisini2540.4%
 
School2080.4%
 
Shule Ya Msingi1990.3%
 
Other values (37390)5120986.2%
 

Length

Max length30
Median length10
Mean length10.96210438
Min length1

num_private
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count65
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47414141414141414
Minimum0
Maximum1776
Zeros58643
Zeros (%)98.7%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12.23622981
Coefficient of variation (CV)25.80713147
Kurtosis11137.29521
Mean0.4741414141
Median Absolute Deviation (MAD)0
Skewness91.93374999
Sum28164
Variance149.72532
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
05864398.7%
 
6810.1%
 
1730.1%
 
5460.1%
 
8460.1%
 
32400.1%
 
45360.1%
 
15350.1%
 
39300.1%
 
9328< 0.1%
 
Other values (55)3420.6%
 
ValueCountFrequency (%) 
05864398.7%
 
1730.1%
 
223< 0.1%
 
327< 0.1%
 
420< 0.1%
 
ValueCountFrequency (%) 
17761< 0.1%
 
14021< 0.1%
 
7551< 0.1%
 
6981< 0.1%
 
6721< 0.1%
 

basin
Categorical

Distinct count9
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
Lake Victoria
10248
Pangani
8940
Rufiji
7976
Internal
7785
Lake Tanganyika
6432
Other values (4)
18019
ValueCountFrequency (%) 
Lake Victoria1024817.3%
 
Pangani894015.1%
 
Rufiji797613.4%
 
Internal778513.1%
 
Lake Tanganyika643210.8%
 
Wami / Ruvu598710.1%
 
Lake Nyasa50858.6%
 
Ruvuma / Southern Coast44937.6%
 
Lake Rukwa24544.1%
 

Length

Max length23
Median length10
Mean length10.8923569
Min length6

subvillage
Categorical

HIGH CARDINALITY

Distinct count19287
Unique (%)32.7%
Missing371
Missing (%)0.6%
Memory size464.1 KiB
Madukani
 
508
Shuleni
 
506
Majengo
 
502
Kati
 
373
Mtakuja
 
262
Other values (19282)
56878
ValueCountFrequency (%) 
Madukani5080.9%
 
Shuleni5060.9%
 
Majengo5020.8%
 
Kati3730.6%
 
Mtakuja2620.4%
 
Sokoni2320.4%
 
M1870.3%
 
Muungano1720.3%
 
Mbuyuni1640.3%
 
Mlimani1520.3%
 
Other values (19277)5597194.2%
 
(Missing)3710.6%
 

Length

Max length30
Median length7
Mean length7.867003367
Min length1

region
Categorical

Distinct count21
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
Iringa
 
5294
Shinyanga
 
4982
Mbeya
 
4639
Kilimanjaro
 
4379
Morogoro
 
4006
Other values (16)
36100
ValueCountFrequency (%) 
Iringa52948.9%
 
Shinyanga49828.4%
 
Mbeya46397.8%
 
Kilimanjaro43797.4%
 
Morogoro40066.7%
 
Arusha33505.6%
 
Kagera33165.6%
 
Mwanza31025.2%
 
Kigoma28164.7%
 
Ruvuma26404.4%
 
Other values (11)2087635.1%
 

Length

Max length13
Median length6
Mean length6.623754209
Min length4

region_code
Real number (ℝ≥0)

Distinct count27
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.297003367003366
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Memory size464.1 KiB

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.58740634
Coefficient of variation (CV)1.149728866
Kurtosis10.28843341
Mean15.29700337
Median Absolute Deviation (MAD)6
Skewness3.17381811
Sum908642
Variance309.3168617
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1153008.9%
 
1750118.4%
 
1246397.8%
 
343797.4%
 
540406.8%
 
1833245.6%
 
1930475.1%
 
230245.1%
 
1628164.7%
 
1026404.4%
 
Other values (17)2118035.7%
 
ValueCountFrequency (%) 
122013.7%
 
230245.1%
 
343797.4%
 
425134.2%
 
540406.8%
 
ValueCountFrequency (%) 
994230.7%
 
909171.5%
 
8012382.1%
 
6010251.7%
 
401< 0.1%
 

district_code
Real number (ℝ≥0)

Distinct count20
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.629747474747475
Minimum0
Maximum80
Zeros23
Zeros (%)< 0.1%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.633648629
Coefficient of variation (CV)1.711204396
Kurtosis16.21428363
Mean5.629747475
Median Absolute Deviation (MAD)1
Skewness3.962045299
Sum334407
Variance92.80718592
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11220320.5%
 
21117318.8%
 
3999816.8%
 
4899915.1%
 
543567.3%
 
640746.9%
 
733435.6%
 
810431.8%
 
309951.7%
 
338741.5%
 
Other values (10)23423.9%
 
ValueCountFrequency (%) 
023< 0.1%
 
11220320.5%
 
21117318.8%
 
3999816.8%
 
4899915.1%
 
ValueCountFrequency (%) 
8012< 0.1%
 
676< 0.1%
 
631950.3%
 
621090.2%
 
60630.1%
 

lga
Categorical

HIGH CARDINALITY

Distinct count125
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
Njombe
 
2503
Arusha Rural
 
1252
Moshi Rural
 
1251
Bariadi
 
1177
Rungwe
 
1106
Other values (120)
52111
ValueCountFrequency (%) 
Njombe25034.2%
 
Arusha Rural12522.1%
 
Moshi Rural12512.1%
 
Bariadi11772.0%
 
Rungwe11061.9%
 
Kilosa10941.8%
 
Kasulu10471.8%
 
Mbozi10341.7%
 
Meru10091.7%
 
Bagamoyo9971.7%
 
Other values (115)4693079.0%
 

Length

Max length16
Median length6
Mean length7.416885522
Min length3

ward
Categorical

HIGH CARDINALITY

Distinct count2092
Unique (%)3.5%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
Igosi
 
307
Imalinyi
 
252
Siha Kati
 
232
Mdandu
 
231
Nduruma
 
217
Other values (2087)
58161
ValueCountFrequency (%) 
Igosi3070.5%
 
Imalinyi2520.4%
 
Siha Kati2320.4%
 
Mdandu2310.4%
 
Nduruma2170.4%
 
Mishamo2030.3%
 
Kitunda2030.3%
 
Msindo2010.3%
 
Chalinze1960.3%
 
Maji ya Chai1900.3%
 
Other values (2082)5716896.2%
 

Length

Max length23
Median length7
Mean length7.505841751
Min length3

population
Real number (ℝ≥0)

ZEROS

Distinct count1049
Unique (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.90998316498317
Minimum0
Maximum30500
Zeros21381
Zeros (%)36.0%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3215
95-th percentile680
Maximum30500
Range30500
Interquartile range (IQR)215

Descriptive statistics

Standard deviation471.4821757
Coefficient of variation (CV)2.620655994
Kurtosis402.2801153
Mean179.9099832
Median Absolute Deviation (MAD)25
Skewness12.66071359
Sum10686653
Variance222295.442
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02138136.0%
 
1702511.8%
 
20019403.3%
 
15018923.2%
 
25016812.8%
 
30014762.5%
 
10011461.9%
 
5011391.9%
 
50010091.7%
 
3509861.7%
 
Other values (1039)1972533.2%
 
ValueCountFrequency (%) 
02138136.0%
 
1702511.8%
 
24< 0.1%
 
34< 0.1%
 
413< 0.1%
 
ValueCountFrequency (%) 
305001< 0.1%
 
153001< 0.1%
 
114631< 0.1%
 
100003< 0.1%
 
98651< 0.1%
 

public_meeting
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing3334
Missing (%)5.6%
Memory size464.1 KiB
True
51011
False
 
5055
(Missing)
 
3334
ValueCountFrequency (%) 
True5101185.9%
 
False50558.5%
 
(Missing)33345.6%
 

recorded_by
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
GeoData Consultants Ltd
59400
ValueCountFrequency (%) 
GeoData Consultants Ltd59400100.0%
 

Length

Max length23
Median length23
Mean length23
Min length23

scheme_management
Categorical

MISSING

Distinct count12
Unique (%)< 0.1%
Missing3877
Missing (%)6.5%
Memory size464.1 KiB
VWC
36793
WUG
 
5206
Water authority
 
3153
WUA
 
2883
Water Board
 
2748
Other values (7)
 
4740
ValueCountFrequency (%) 
VWC3679361.9%
 
WUG52068.8%
 
Water authority31535.3%
 
WUA28834.9%
 
Water Board27484.6%
 
Parastatal16802.8%
 
Private operator10631.8%
 
Company10611.8%
 
Other7661.3%
 
SWC970.2%
 
Other values (2)730.1%
 
(Missing)38776.5%
 

Length

Max length16
Median length3
Mean length4.537373737
Min length3

scheme_name
Categorical

HIGH CARDINALITY
MISSING

Distinct count2696
Unique (%)8.6%
Missing28166
Missing (%)47.4%
Memory size464.1 KiB
K
 
682
None
 
644
Borehole
 
546
Chalinze wate
 
405
M
 
400
Other values (2691)
28557
ValueCountFrequency (%) 
K6821.1%
 
None6441.1%
 
Borehole5460.9%
 
Chalinze wate4050.7%
 
M4000.7%
 
DANIDA3790.6%
 
Government3200.5%
 
Ngana water supplied scheme2700.5%
 
wanging'ombe water supply s2610.4%
 
wanging'ombe supply scheme2340.4%
 
Other values (2686)2709345.6%
 
(Missing)2816647.4%
 

Length

Max length46
Median length3
Mean length8.94456229
Min length1

permit
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing3056
Missing (%)5.1%
Memory size464.1 KiB
True
38852
False
17492
(Missing)
 
3056
ValueCountFrequency (%) 
True3885265.4%
 
False1749229.4%
 
(Missing)30565.1%
 

construction_year
Real number (ℝ≥0)

ZEROS

Distinct count55
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1300.6524747474748
Minimum0
Maximum2013
Zeros20709
Zeros (%)34.9%
Memory size464.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation951.6205473
Coefficient of variation (CV)0.7316485885
Kurtosis-1.596432369
Mean1300.652475
Median Absolute Deviation (MAD)22
Skewness-0.6349277866
Sum77258757
Variance905581.6661
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
02070934.9%
 
201026454.5%
 
200826134.4%
 
200925334.3%
 
200020913.5%
 
200715872.7%
 
200614712.5%
 
200312862.2%
 
201112562.1%
 
200411231.9%
 
Other values (45)2208637.2%
 
ValueCountFrequency (%) 
02070934.9%
 
19601020.2%
 
196121< 0.1%
 
1962300.1%
 
1963850.1%
 
ValueCountFrequency (%) 
20131760.3%
 
201210841.8%
 
201112562.1%
 
201026454.5%
 
200925334.3%
 

extraction_type
Categorical

HIGH CORRELATION

Distinct count18
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
gravity
26780
nira/tanira
8154
other
6430
submersible
 
4764
swn 80
 
3670
Other values (13)
9602
ValueCountFrequency (%) 
gravity2678045.1%
 
nira/tanira815413.7%
 
other643010.8%
 
submersible47648.0%
 
swn 8036706.2%
 
mono28654.8%
 
india mark ii24004.0%
 
afridev17703.0%
 
ksb14152.4%
 
other - rope pump4510.8%
 
Other values (8)7011.2%
 

Length

Max length25
Median length7
Mean length7.719511785
Min length3

extraction_type_group
Categorical

HIGH CORRELATION

Distinct count13
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
gravity
26780
nira/tanira
8154
other
6430
submersible
6179
swn 80
 
3670
Other values (8)
8187
ValueCountFrequency (%) 
gravity2678045.1%
 
nira/tanira815413.7%
 
other643010.8%
 
submersible617910.4%
 
swn 8036706.2%
 
mono28654.8%
 
india mark ii24004.0%
 
afridev17703.0%
 
rope pump4510.8%
 
other handpump3640.6%
 
Other values (3)3370.6%
 

Length

Max length15
Median length7
Mean length7.880538721
Min length4

extraction_type_class
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
gravity
26780
handpump
16456
other
6430
submersible
6179
motorpump
 
2987
Other values (2)
 
568
ValueCountFrequency (%) 
gravity2678045.1%
 
handpump1645627.7%
 
other643010.8%
 
submersible617910.4%
 
motorpump29875.0%
 
rope pump4510.8%
 
wind-powered1170.2%
 

Length

Max length12
Median length7
Mean length7.602239057
Min length5

management
Categorical

HIGH CORRELATION

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
vwc
40507
wug
 
6515
water board
 
2933
wua
 
2535
private operator
 
1971
Other values (7)
 
4939
ValueCountFrequency (%) 
vwc4050768.2%
 
wug651511.0%
 
water board29334.9%
 
wua25354.3%
 
private operator19713.3%
 
parastatal17683.0%
 
water authority9041.5%
 
other8441.4%
 
company6851.2%
 
unknown5610.9%
 
Other values (2)1770.3%
 

Length

Max length16
Median length3
Mean length4.350639731
Min length3

management_group
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
user-group
52490
commercial
 
3638
parastatal
 
1768
other
 
943
unknown
 
561
ValueCountFrequency (%) 
user-group5249088.4%
 
commercial36386.1%
 
parastatal17683.0%
 
other9431.6%
 
unknown5610.9%
 

Length

Max length10
Median length10
Mean length9.892289562
Min length5

payment
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
never pay
25348
pay per bucket
8985
pay monthly
8300
unknown
8157
pay when scheme fails
 
3914
Other values (2)
 
4696
ValueCountFrequency (%) 
never pay2534842.7%
 
pay per bucket898515.1%
 
pay monthly830014.0%
 
unknown815713.7%
 
pay when scheme fails39146.6%
 
pay annually36426.1%
 
other10541.8%
 

Length

Max length21
Median length9
Mean length10.66479798
Min length5

payment_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
never pay
25348
per bucket
8985
monthly
8300
unknown
8157
on failure
 
3914
Other values (2)
 
4696
ValueCountFrequency (%) 
never pay2534842.7%
 
per bucket898515.1%
 
monthly830014.0%
 
unknown815713.7%
 
on failure39146.6%
 
annually36426.1%
 
other10541.8%
 

Length

Max length10
Median length9
Mean length8.530757576
Min length5

water_quality
Categorical

HIGH CORRELATION

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
soft
50818
salty
 
4856
unknown
 
1876
milky
 
804
coloured
 
490
Other values (3)
 
556
ValueCountFrequency (%) 
soft5081885.6%
 
salty48568.2%
 
unknown18763.2%
 
milky8041.4%
 
coloured4900.8%
 
salty abandoned3390.6%
 
fluoride2000.3%
 
fluoride abandoned17< 0.1%
 

Length

Max length18
Median length4
Mean length4.303282828
Min length4

quality_group
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
good
50818
salty
 
5195
unknown
 
1876
milky
 
804
colored
 
490
ValueCountFrequency (%) 
good5081885.6%
 
salty51958.7%
 
unknown18763.2%
 
milky8041.4%
 
colored4900.8%
 
fluoride2170.4%
 

Length

Max length8
Median length4
Mean length4.23510101
Min length4

quantity
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
enough
33186
insufficient
15129
dry
 
6246
seasonal
 
4050
unknown
 
789
ValueCountFrequency (%) 
enough3318655.9%
 
insufficient1512925.5%
 
dry624610.5%
 
seasonal40506.8%
 
unknown7891.3%
 

Length

Max length12
Median length6
Mean length7.362373737
Min length3

quantity_group
Categorical

HIGH CORRELATION

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
enough
33186
insufficient
15129
dry
 
6246
seasonal
 
4050
unknown
 
789
ValueCountFrequency (%) 
enough3318655.9%
 
insufficient1512925.5%
 
dry624610.5%
 
seasonal40506.8%
 
unknown7891.3%
 

Length

Max length12
Median length6
Mean length7.362373737
Min length3

source
Categorical

HIGH CORRELATION

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
spring
17021
shallow well
16824
machine dbh
11075
river
9612
rainwater harvesting
 
2295
Other values (5)
 
2573
ValueCountFrequency (%) 
spring1702128.7%
 
shallow well1682428.3%
 
machine dbh1107518.6%
 
river961216.2%
 
rainwater harvesting22953.9%
 
hand dtw8741.5%
 
lake7651.3%
 
dam6561.1%
 
other2120.4%
 
unknown660.1%
 

Length

Max length20
Median length11
Mean length8.978804714
Min length3

source_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
spring
17021
shallow well
16824
borehole
11949
river/lake
10377
rainwater harvesting
 
2295
Other values (2)
 
934
ValueCountFrequency (%) 
spring1702128.7%
 
shallow well1682428.3%
 
borehole1194920.1%
 
river/lake1037717.5%
 
rainwater harvesting22953.9%
 
dam6561.1%
 
other2780.5%
 

Length

Max length20
Median length8
Mean length9.303602694
Min length3

source_class
Categorical

HIGH CORRELATION

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
groundwater
45794
surface
13328
unknown
 
278
ValueCountFrequency (%) 
groundwater4579477.1%
 
surface1332822.4%
 
unknown2780.5%
 

Length

Max length11
Median length11
Mean length10.08377104
Min length7

waterpoint_type
Categorical

HIGH CORRELATION

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
communal standpipe
28522
hand pump
17488
other
6380
communal standpipe multiple
6103
improved spring
 
784
Other values (2)
 
123
ValueCountFrequency (%) 
communal standpipe2852248.0%
 
hand pump1748829.4%
 
other638010.7%
 
communal standpipe multiple610310.3%
 
improved spring7841.3%
 
cattle trough1160.2%
 
dam7< 0.1%
 

Length

Max length27
Median length18
Mean length14.82757576
Min length3

waterpoint_type_group
Categorical

HIGH CORRELATION

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size464.1 KiB
communal standpipe
34625
hand pump
17488
other
 
6380
improved spring
 
784
cattle trough
 
116
ValueCountFrequency (%) 
communal standpipe3462558.3%
 
hand pump1748829.4%
 
other638010.7%
 
improved spring7841.3%
 
cattle trough1160.2%
 
dam7< 0.1%
 

Length

Max length18
Median length18
Mean length13.90287879
Min length3

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_group
0695726000.02011-03-14Roman1390Roman34.938093-9.856322none0Lake NyasaMnyusi BIringa115LudewaMundindi109TrueGeoData Consultants LtdVWCRomanFalse1999gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
187760.02013-03-06Grumeti1399GRUMETI34.698766-2.147466Zahanati0Lake VictoriaNyamaraMara202SerengetiNatta280<NA>GeoData Consultants LtdOtherNaNTrue2010gravitygravitygravitywuguser-groupnever paynever paysoftgoodinsufficientinsufficientrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipe
23431025.02013-02-25Lottery Club686World vision37.460664-3.821329Kwa Mahundi0PanganiMajengoManyara214SimanjiroNgorika250TrueGeoData Consultants LtdVWCNyumba ya mungu pipe schemeTrue2009gravitygravitygravityvwcuser-grouppay per bucketper bucketsoftgoodenoughenoughdamdamsurfacecommunal standpipe multiplecommunal standpipe
3677430.02013-01-28Unicef263UNICEF38.486161-11.155298Zahanati Ya Nanyumbu0Ruvuma / Southern CoastMahakamaniMtwara9063NanyumbuNanyumbu58TrueGeoData Consultants LtdVWCNaNTrue1986submersiblesubmersiblesubmersiblevwcuser-groupnever paynever paysoftgooddrydrymachine dbhboreholegroundwatercommunal standpipe multiplecommunal standpipe
4197280.02011-07-13Action In A0Artisan31.130847-1.825359Shuleni0Lake VictoriaKyanyamisaKagera181KaragweNyakasimbi0TrueGeoData Consultants LtdNaNNaNTrue0gravitygravitygravityotherothernever paynever paysoftgoodseasonalseasonalrainwater harvestingrainwater harvestingsurfacecommunal standpipecommunal standpipe
5994420.02011-03-13Mkinga Distric Coun0DWE39.172796-4.765587Tajiri0PanganiMoa/MweremeTanga48MkingaMoa1TrueGeoData Consultants LtdVWCZingibaliTrue2009submersiblesubmersiblesubmersiblevwcuser-grouppay per bucketper bucketsaltysaltyenoughenoughotherotherunknowncommunal standpipe multiplecommunal standpipe
6198160.02012-10-01Dwsp0DWSP33.362410-3.766365Kwa Ngomho0InternalIshinabulandiShinyanga173Shinyanga RuralSamuye0TrueGeoData Consultants LtdVWCNaNTrue0swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodenoughenoughmachine dbhboreholegroundwaterhand pumphand pump
7545510.02012-10-09Rwssp0DWE32.620617-4.226198Tushirikiane0Lake TanganyikaNyawishi CenterShinyanga173KahamaChambo0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpwuguser-groupunknownunknownmilkymilkyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump
8539340.02012-11-03Wateraid0Water Aid32.711100-5.146712Kwa Ramadhan Musa0Lake TanganyikaImalaudukiTabora146Tabora UrbanItetemia0TrueGeoData Consultants LtdVWCNaNTrue0india mark iiindia mark iihandpumpvwcuser-groupnever paynever paysaltysaltyseasonalseasonalmachine dbhboreholegroundwaterhand pumphand pump
9461440.02011-08-03Isingiro Ho0Artisan30.626991-1.257051Kwapeto0Lake VictoriaMkonomreKagera181KaragweKaisho0TrueGeoData Consultants LtdNaNNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump

Last rows

idamount_tshdate_recordedfundergps_heightinstallerlongitudelatitudewpt_namenum_privatebasinsubvillageregionregion_codedistrict_codelgawardpopulationpublic_meetingrecorded_byscheme_managementscheme_namepermitconstruction_yearextraction_typeextraction_type_groupextraction_type_classmanagementmanagement_grouppaymentpayment_typewater_qualityquality_groupquantityquantity_groupsourcesource_typesource_classwaterpoint_typewaterpoint_type_group
59390136770.02011-08-04Rudep1715DWE31.370848-8.258160Kwa Mzee Atanas0Lake TanganyikaKitontoRukwa152Sumbawanga RuralMkowe150TrueGeoData Consultants LtdVWCNaNFalse1991swn 80swn 80handpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientmachine dbhboreholegroundwaterhand pumphand pump
59391448850.02013-08-03Government Of Tanzania540Government38.044070-4.272218Kwa0PanganiMaore KatiKilimanjaro33SameMaore210TrueGeoData Consultants LtdWater authorityHingililiTrue1967gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipe
59392406070.02011-04-15Government Of Tanzania0Government33.009440-8.520888Benard Charles0Lake RukwaMbuyuni AMbeya121ChunyaMbuyuni0TrueGeoData Consultants LtdVWCNaNTrue0gravitygravitygravityvwcuser-groupnever paynever paysoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
59393483480.02012-10-27Private0Private33.866852-4.287410Kwa Peter0InternalMasangaTabora142IgungaIgunga0FalseGeoData Consultants LtdWater authorityNaNFalse0gravitygravitygravityprivate operatorcommercialpay per bucketper bucketsoftgoodinsufficientinsufficientdamdamsurfaceotherother
5939411164500.02011-03-09World Bank351ML appro37.634053-6.124830Chimeredya0Wami / RuvuKomstariMorogoro56MvomeroDiongoya89TrueGeoData Consultants LtdVWCNaNTrue2007submersiblesubmersiblesubmersiblevwcuser-grouppay monthlymonthlysoftgoodenoughenoughmachine dbhboreholegroundwatercommunal standpipecommunal standpipe
593956073910.02013-05-03Germany Republi1210CES37.169807-3.253847Area Three Namba 270PanganiKiduruniKilimanjaro35HaiMasama Magharibi125TrueGeoData Consultants LtdWater BoardLosaa Kia water supplyTrue1999gravitygravitygravitywater boarduser-grouppay per bucketper bucketsoftgoodenoughenoughspringspringgroundwatercommunal standpipecommunal standpipe
59396272634700.02011-05-07Cefa-njombe1212Cefa35.249991-9.070629Kwa Yahona Kuvala0RufijiIgumbiloIringa114NjombeIkondo56TrueGeoData Consultants LtdVWCIkondo electrical water schTrue1996gravitygravitygravityvwcuser-grouppay annuallyannuallysoftgoodenoughenoughriverriver/lakesurfacecommunal standpipecommunal standpipe
59397370570.02011-04-11NaN0NaN34.017087-8.750434Mashine0RufijiMadunguluMbeya127MbaraliChimala0TrueGeoData Consultants LtdVWCNaNFalse0swn 80swn 80handpumpvwcuser-grouppay monthlymonthlyfluoridefluorideenoughenoughmachine dbhboreholegroundwaterhand pumphand pump
59398312820.02011-03-08Malec0Musa35.861315-6.378573Mshoro0RufijiMwinyiDodoma14ChamwinoMvumi Makulu0TrueGeoData Consultants LtdVWCNaNTrue0nira/taniranira/tanirahandpumpvwcuser-groupnever paynever paysoftgoodinsufficientinsufficientshallow wellshallow wellgroundwaterhand pumphand pump
59399263480.02011-03-23World Bank191World38.104048-6.747464Kwa Mzee Lugawa0Wami / RuvuKikatanyembaMorogoro52Morogoro RuralNgerengere150TrueGeoData Consultants LtdVWCNaNTrue2002nira/taniranira/tanirahandpumpvwcuser-grouppay when scheme failson failuresaltysaltyenoughenoughshallow wellshallow wellgroundwaterhand pumphand pump